Clustering Sentence-Level Text Using a Novel Fuzzy Relational Clustering Algorithm
نویسندگان
چکیده
In comparison with hard clustering methods, in which a pattern belongs to a single cluster, fuzzy clustering algorithms allow patterns to belong to all clusters with differing degrees of membership. This is important in domains such as sentence clustering, since a sentence is likely to be related to more than one theme or topic present within a document or set of documents. However, because most sentence similarity measures do not represent sentences in a common metric space, conventional fuzzy clustering approaches based on prototypes or mixtures of Gaussians are generally not applicable to sentence clustering. This paper presents a novel fuzzy clustering algorithm that operates on relational input data; i.e., data in the form of a square matrix of pairwise similarities between data objects. The algorithm uses a graph representation of the data, and operates in an Expectation-Maximization framework in which the graph centrality of an object in the graph is interpreted as likelihood. Results of applying the algorithm to sentence clustering tasks demonstrate that the algorithm is capable of identifying overlapping clusters of semantically related sentences, and that it is therefore of potential use in a variety of text mining tasks. We also include results of applying the algorithm to benchmark data sets in several other domains.
منابع مشابه
Sentence Level Text Clustering using a Hierarchical Fuzzy Relational Clustering Algorithm
Clustering is the process of grouping or aggregating of data items. Sentence clustering mainly used in variety of applications such as classify and categorization of documents, automatic summary generation, organizing the documents, etc. In text processing, sentence clustering plays a vital role this is used in text mining activities. Size of the clusters may change from one cluster to another....
متن کاملOptimal Sentence Clustering Using An Innovative Hierarchical Fuzzy Clustering Algorithm
The role of data clustering is inevitable in many text processing activities .Many proceedings are going on in this area since it has wider applications. Sentence clustering is a challenging task when compared with other data clustering, because a sentence is able to represent same ideas in different ways. For E.g. some people see a glass as half empty and some others see half full. Due to this...
متن کاملClustering Sentence-Level Text Using a Fuzzy Back- Propagation Clustering Algorithm
In comparison with hard clustering methods, in which a pattern belongs to a unique cluster, clustering algorithms with fuzziness allow patterns with differing degrees of membership to belong to all clusters. This is important in domains such as sentence clustering, as a sentence may belong to more than a topic present within a document or set of documents. Since most sentence similarity measure...
متن کاملSurvey on Clustering Algorithm for Sentence Level Text
Clustering is an extensively studied data mining problem in the text domains. The difficulty finds numerous applications in customer segmentation, classification, collaborative filtering, visualization, document organization, and indexing. In text mining, clustering the sentence is one of the processes and used within general text mining tasks. Several clustering methods and algorithms are used...
متن کاملProposing a Novel Cost Sensitive Imbalanced Classification Method based on Hybrid of New Fuzzy Cost Assigning Approaches, Fuzzy Clustering and Evolutionary Algorithms
In this paper, a new hybrid methodology is introduced to design a cost-sensitive fuzzy rule-based classification system. A novel cost metric is proposed based on the combination of three different concepts: Entropy, Gini index and DKM criterion. In order to calculate the effective cost of patterns, a hybrid of fuzzy c-means clustering and particle swarm optimization algorithm is utilized. This ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014